Hybrid weighted random forests for classifying very high-dimensional data
نویسندگان
چکیده
Random forests are a popular classification method based on an ensemble of a single type of decision trees from subspaces of data. In the literature, there are many different types of decision tree algorithms, including C4.5, CART, and CHAID. Each type of decision tree algorithm may capture different information and structure. This paper proposes a hybrid weighted random forest algorithm, simultaneously using a feature weighting method and a hybrid forest method to classify very high dimensional data. The hybrid weighted random forest algorithm can effectively reduce subspace size and improve classification performance without increasing the error bound. We conduct a series of experiments on eight high dimensional datasets to compare our method with traditional random forest methods and other classification methods. The results show that our method consistently outperforms these traditional methods.
منابع مشابه
Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees
The random forests method is one of the most successful ensemble methods. However, random forests do not have high performance when dealing with very-high-dimensional data in presence of dependencies. In this case one can expect that there exist many combinations between the variables and unfortunately the usual random forests method does not effectively exploit this situation. We here investig...
متن کاملREGRESSION LEAF FOREST: A FAST AND ACCURATE LEARNING METHOD FOR LARGE & HIGH DIMENSIONAL DATA SETS by SIVANESAN GANESAN
There are a number of learning methods that provide solutions to classification and regression problems, including Linear Regression, Decision Trees, KNN, and SVMs. These methods work well in many applications, but they are challenged for real world problems that are noisy, nonlinear or high dimensional. Furthermore, missing data (e.g., missing historical features of companies in stock data), i...
متن کاملRandom Forests and Adaptive Nearest Neighbors
In this paper we study random forests through their connection with a new framework of adaptive nearest neighbor methods. We first introduce a concept of potential nearest neighbors (k-PNN’s) and show that random forests can be seen as adaptively weighted k-PNN methods. Various aspects of random forests are then studied from this perspective. We investigate the effect of terminal node sizes and...
متن کاملRandom Forests with Missing Values in the Covariates
In Random Forests [2] several trees are constructed from bootstrapor subsamples of the original data. Random Forests have become very popular, e.g., in the fields of genetics and bioinformatics, because they can deal with high-dimensional problems including complex interaction effects. Conditional Inference Forests [8] provide an implementation of Random Forests with unbiased variable selection...
متن کاملExtensions to Quantile Regression Forests for Very High-Dimensional Data
This paper describes new extensions to the state-of-the-art regression random forests Quantile Regression Forests (QRF) for applications to high dimensional data with thousands of features. We propose a new subspace sampling method that randomly samples a subset of features from two separate feature sets, one containing important features and the other one containing less important features. Th...
متن کامل